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Abstract — The design of genetic networks with specific func- 
tions is one of the major goals of synthetic biology. However, 
constructing biological devices that work "as required" remains 
challenging, while the cost of uncovering flawed designs exper- 
imentally is large. To address this issue, we propose a fully 
automated framework that allows the correctness of synthetic 
gene networks to be formally verified in silico from rich, high 
level functional specifications. 

Given a device, we automatically construct a mathematical 
model from experimental data characterizing the parts it is 
composed of The specific model structure guarantees that 
all experimental observations are captured and allows us to 
construct finite abstractions through polyhedral operations. 
The correctness of the model with respect to temporal logic 
specifications can then be verified automatically using methods 
inspired by model checking. 

Overall, our procedure is conservative but it can filter 
through a large number of potential device designs and select 
few that satisfy the specification to be implemented and tested 
further experimentally. Illustrative examples of the application 
of our methods to the design of simple synthetic gene networks 
are included. 

I. INTRODUCTION 

Synthetic biology is an emerging field that focuses on the 
rational design of biological systems. A number of biological 
devices - engineered gene networks that function as switches, 
oscillators, counters, logic gates, etc. have been designed, 
implemented, and validated experimentally, demonstrating 
the feasibility of the approach (see [22] for a review). 
However, success stories have largely been the result of 
extensive manual effort, application of various modeling and 
analysis techniques, and trial and error experimentation. As 
the field matures, real world applications are sought requiring 
a systematic approach that enables the implementation of 
complicated designs into functionally correct devices with 
less experimental work. 

One such approach has been enabled by biological stan- 
dards [17] and (online) part libraries [1] and involves the 
modular design and construction of devices from biological 
parts - genetic sequences known to function as promoters, 
ribosome binding sites, coding sequences, etc. This, in turn, 
has allowed the development of Bio-Design Automation 
(BDA) platforms, such as Clotho [11], which provide an 
environment where online libraries can be accessed, devices 
can be designed using a graphical interface and checked 
against rules of correct assembly, while the construction 
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process is completed automatically by liquid handling robots. 
Even so, current BDA platforms asses potential devices only 
in terms of their assembly feasibility and not based on 
their correctness with respect to specifications of required 
function. 

Designing biological devices that work "as required" re- 
mains challenging and, to minimize costly experimentation, 
it is usually approached through modehng (see [9] for a 
review of modeling formalisms). A realistic model is needed 
to guide design efforts but such models are often hard 
to analyze. In addition, estimating model parameters may 
require extensive experimental data which is rarely available, 
although characterizations resulting in biological part data 
sheets [7] are currently ongoing. Besides selecting a realistic 
yet analytically tractable model, specifying the required de- 
vice behavior in a formalism that is both general and allows 
for automatic analysis procedures is a separate challenge. In 
this paper, we propose a fully automated framework for in 
silico verification of synthetic gene networks from rich, high 
level specifications expressed as temporal logic formulas. 

Temporal logics [8] are customarily used for specifying 
the correctness of digital circuits and computer programs. 
Due to their expressivity and resemblance to natural language 
they have gained popularity in other areas including the 
specification and analysis of qualitative behavior of genetic 
networks [2], [4], [3]. There also exist off-the-shelf algo- 
rithms for verifying the correctness of a finite state system for 
a temporal logic specification (model-checking). However, 
such finite models are usually too simple to capture the 
dynamics of genetic networks with the detail necessary for 
design applications. 

In our previous work [26] we used piecewise affine (PWA) 
systems as models of gene networks [25]. Such systems are 
globally complex and can approximate nonlinear dynamics 
with arbitrary accuracy [18], which makes them realistic 
models. They are also locally simple, which allowed us to 
analyze them formally from temporal logic specifications 
through a procedure based on the construction and refinement 
of finite abstractions through polyhedral operations [26] and 
model-checking [8]. In this paper, we use a class of models 
that is inspired by PWA systems but is more general. To 
account for the variability due to experimental conditions 
and the uncertainty inherent in biological systems we allow 
model parameters to vary in some ranges. We develop a 
procedure for the automatic construction of such models 
from part characterization data with the guarantee that all ex- 
perimental observations can be reproduced by the identified 
model. We also extend our methods from [26] and integrate 



them with our model identification procedure, which leads 
to a fully automatic framework for specifying and verifying 
the correctness of genetic networks constructed from parts. 
Our approach can be used both to verify individual device 
designs or to automatically explore the space of potential 
device designs that can be constructed from characterized 
parts, available from libraries. 

In terms of analysis, our method is related to tools such 
as the Genetic Network Analyzer (GNA) [10] and 
RoVerGeNe [5], which study biological systems using tem- 
poral logic specifications but usually focus on the analysis of 
separate devices for which a model is available. Instead, our 
procedure can explore different devices constructed from a 
set of parts, while models are derived automatically from 
part characterization data. In that respect, our approach 
resembles methods such as [19] that focus on the generation 
and analysis of analog and mixed- signal circuit models 
from simulation traces. Unlike other gene network modeling 
approaches, we do not enforce sigmoidal (Hill) regulation 
functions but construct models which capture all experi- 
mental observations and resemble uncertain parameter PWA 
systems. Our procedure is therefore also related to methods 
for the identification of PWA systems (see [16] for a review). 
Such tools address the threshold reconstruction problem 
more rigorously but only identify fixed parameter models 
and require device experimental data which is not usually 
available during design. This motivates the development 
of our procedure for constructing device models from part 
characterization data. 

Automatic construction of device models from kinetic pa- 
rameters of parts has been implemented in Asmparts [23], 
SynBioSS [15], GEC [21] and GenoCAD [6]. These tools 
can be used to search for devices with specific functions by 
numerically simulating potential device models and assessing 
the "goodness" of the resulting trajectories but face three 
main challenges. First, it is assumed that parameters of 
individual parts are known which is not always the case. 
In this paper, we only assume that protein degradation rates 
are known, motivated by the fact that such information is 
often available from literature or can be predicted com- 
putationally [14]. However, we compute expression rates 
from experimental data, which makes our approach easier to 
apply in practice. Second, the models resulting from other 
procedures are hard to analyze and, therefore, numerical 
simulation is used to asses device behavior. This can be 
unfeasible for large state spaces and, in general, does not lead 
to any formal guarantees. We construct models which are rich 
enough to capture all experimental observations but can also 
be analyzed formally using model-checking based methods, 
which we have developed previously [26] but extend in this 
paper. Finally, no general formalism for specifying required 
device behavior is provided in the outlined procedures and 
only solutions for specific cases are considered. We formalize 
high level specifications in linear temporal logic which is 
both rich (i.e. it captures many properties of interest) and 
"user-friendly" (i.e. it resembles natural language). 

The remainder of this paper is organized as follows. 
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Fig. 1. In the simplified expression mechanism we consider, a gene 
g is expressed from promoter p at rate /3p to produce protein, whose 
concentration is denoted as Xg. The protein degrades at rate ag. The 
promoter might be regulated by protein Xg/ . 

In Sec. [n] we formulate the main problem and present 
an overview of our approach. In Sec. |lll] we discuss the 
automatic construction of models from part characterization 
data. We focus on the formal analysis of those models 



through the construction of finite abstractions in Sec. IV 



In Sec.|V]we present illustrative examples of the application 
of our framework to the design of a bistable synthetic gene 
network. We conclude with final remarks and directions of 
future work in Sec. |VlJ 

Throughout the rest of the paper we use the following 
notation. Given a set S we use \S\ and 2'^ to denote 
the cardinality and the powerset (the set of subsets) of S, 
respectively. For a set 5 C and a scalar A G R, we use 
AS' to denote the set of elements from S multiplied by A. 
Given sets S and S' we denote their Minkowski (set) sum by 
S-\-S'. Given poly tope X, we denote the set of vertices of X 
by V{X) and their convex hull as X = hull{{v e V{X)}). 

II. PROBLEM FORMULATION 

In this section we formulate the problem of verifying the 
correctness of a gene network (biological device) from high 
level specifications. We start by discussing our simplified 
view of the biochemistry involved in gene expression, the 
parts we consider as basic building blocks of all devices and 
the experimental data that we assume is available. 

We consider only two basic types of biological parts - 
sequences of DNA that either function as promoters or code 
for proteins (we refer to such sequences as genes). This is 
the minimal set of parts required to define the interactions 
in gene networks but the methods we subsequently develop 
can be extended easily for more detailed formulations. We 
assume that each gene codes for a single protein which 
degrades at a certain rate and whose concentrations can be 
measured directly in experiments. We treat protein produc- 
tion as a single step process, which is sufficient to capture 
transcriptional regulation (see Fig. [TJ. 

For a protein to be produced, its corresponding gene must 
be expressed, which requires placing it after a promoter (we 
assume that other sequences required for correct expression 
such as a ribosome binding site are already contained within 
a gene). The simplest device we consider contains a single 
promoter and expresses a single gene to produce a single 
protein (Fig.[T]). By placing multiple genes on the same pro- 
moter and including additional promoters, more complicated 
devices can be assembled. We assume that, in a device, a 
gene is expressed from a single promoter - such assembly 
constraints are handled by platforms such as Clotho [11]. 
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Fig. 2. |(a)| Experimental observations in the form of a range of values 
of the rate of expression /3pi characterize the constitutive promoter pi. |(b)| 
The regulated promoter p2 is characterized by experimental observations of 
the rate of expression f3p2 at different concentrations of the regulator (in 
this case, repressor) Xgf. 

We differentiate between constitutive and regulated pro- 
moters. A protein is always produced if its gene is expressed 
from a constitutive promoter (i.e. the promoter is always 
"on"), while expression from a regulated promoter varies, 
depending on the concentrations of proteins or chemicals 
(inducers), called regulators. In general, a promoter can 
be regulated by several regulators but, for simplicity of 
presentation in this paper we consider only the case of a 
single regulator per promoter, although our methods can also 
be extended for the more general case. 

We consider only devices built from characterized parts - 
genes and promoters for which experimental data indicative 
of their performance is available. A gene is characterized 
by the degradation rate (or equivalently the half-life) of 
the protein it codes for, which we assume is a fixed and 
known value. Protein degradation rates are often available 
from literature or can be predicted computationally [14]. A 
promoter is characterized by a rate of expression, which we 
assume is the same for all genes expressed from it. However, 
because of variability in experimental conditions and the 
inherent uncertainty of biological systems, we assume that 
the rate of expression from a promoter varies in a certain 
range. For a constitutive promoter, the characterization data 
is simply a set of experimentally measured expression rates 
(Fig. 2(a)| ), while for a regulated promoter, we assume that 
experimental measurements of the expression rate at different 



concentrations of the regulator are available (Fig. 2(b) i. Mea 



suring expression rates directly can be challenging and such 
data is usually obtained by simultaneously measuring the 
concentration of a regulator and a gene expressed from the 



regulated promoter [24]. In Sec. Ill we provide a procedure 



for converting such measurements to the expression rates 
data shown in Fig. [2] which we assume is available for all 
characterized promoters. 

Given a device, we are interested in studying the dynamics 
of the concentrations of proteins expressed from its genes. 
Let G denote the set of genes where N — |G| is the 
device size. We use Xg to denote the concentration of the 
protein expressed from gene g G G, which is bounded in 
a physiologically relevant range x 
hyper-rectangle 
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is the feasible state space of the device, where each x G X 
is a vector of the concentrations of all proteins Xg,g G G. 



Given an initial state x{0) G X the concentrations of species 
from G evolve over time and produce an infinite sequence 
x(0), x(l), . . . called a trajectory, where x{k) G A* is the 
state at step k — 1, . . .. 

We define a set of atomic propositions 11 as a set of linear 
inequalities 



n = {. 



1, . . . , E:}, TTi = {x G A* I cJx-\-di < 0}. (2) 



In other words, each atomic proposition tt^ partitions the 
feasible space X into a satisfying and violating subset for tt^. 
Given a state x G we write x 1= tt^ if and only if cfx+di < 
(i.e X satisfies tt^). A trajectory x(0), x(l), . . . produces an 
infinite word w{0), w{l), . . . where w{k) = {tt G 11 | x{k) \= 
7t} is the set of propositions satisfied at step k. 

To specify temporal logic properties of trajectories of 
the system we use Linear Temporal Logic [8]. Informally, 
LTL formulas over 11 are inductively defined by using 
the standard Boolean operators (e.g., -> (negation), V (dis- 
junction), A (conjunction)) and temporal operators, which 
include Q ("next"), U ("until"), □ ("always"), and {> 
("eventually"). LTL formulas are interpreted over infinite 
words, as those generated by the system. For example, 
the word w{0),w{l), . . . where w{0) — {7Ti,7T2},w{1) — 
{tti, 7r2, TTs}, and w{2),w{3), . . . — {7Ti,7T4} satisfies for- 
mulas Dtti, <0>7r3, <C>n(7ri A7T4), and 7T2^7T4 but violates □7r2 
and <C>7r5. We say that a trajectory x(0), x(l), . . . satisfies 
an LTL formula cj) if and only if the corresponding word 
w{0),w{l),... satisfies cj). The device satisfies cj) from a 
given region X C X if and only if all trajectories originating 
in X satisfy the formula. 

We are now ready to formulate the main problem we 
consider in this paper: 

Problem 1 : Given a device constructed from characterized 
parts and a specification expressed as an LTL formula over 
a set of linear inequalities in the concentrations of proteins, 
determine if the device satisfies the specification. 

Our approach to Problem [T] consists of two main steps. 
Given a device, we first use the characterization data avail- 
able for its parts to automatically construct a mathematical 



model by applying the procedure we develop in Sec.|III| The 
particular model structure we enforce allows us to guarantee 
that all experimental observations can be reproduced by the 
identified model. As a second step, we also exploit this 
structure to analyze the model from the temporal logic spec- 
ification using a method inspired by model-checking, which 
we described in [26] but review and extend in Sec. |IV[ Our 
analysis procedure results in the computation of a satisfying 
(respectively, violating) region - a subset of the system's state 
space from which all trajectories are guaranteed to satisfy 
(respectively, violate) the specification. A device design is 
considered "good" if analysis reveals a large satisfying region 
and an empty or small violating region, while a design is 
"bad" whenever a substantial violating region is found. Given 
a library of characterized parts, our overall procedure can 
serve to evaluate a large number of possible device designs 
in order to select few for further experimental testing. 



III. MODEL CONSTRUCTION 

In this section, we describe our procedure for the auto- 
matic construction of device models from part character- 
ization data. As it will become clear later, the resulting 
models capture all experimental observations and take the 
form of uncertain parameter systems with different dynamics 
in different regions of the state space. 

In Sec. [n] we considered a simplified mechanism of gene 
expression (Fig.[T]). A gene g was expressed from promoter p 
at rate /3p to make protein whose concentration was denoted 
by Xg and which degraded at rate ag. We can express the 
dynamics of protein concentration as 



+ 1) = agXg{k) + /3p. 



(3) 



In the problem formulation of Sec. |TT] we assumed that, for 
each gene (protein) g, ag has a fixed value which is known 
for characterized parts, but jSp is allowed to vary in some 
range, which is unknown and must be computed from the 
promoter characterization data (Fig. [2]l. 

We first consider the computation of a range Bp c M. 
for a constitutive promoter p, such that (3p G Bp in Eqn. (jSj). 
Then, we consider a regulated promoter where /3p G Bp{xg') 
(i.e. the range of allowed rates Bp{xg') C M is, in general, 
a function of the regulator concentration Xg'). For both, 
we first discuss the case when experimental observations of 
expression rates are directly available and later extend our 
procedure to compute such measurements indirectly from 
more realistic experimental data. We conclude the section 
by discussing the construction of models for general devices 
composed of a number of characterized parts. In developing 
our model identification procedure, we seek to compute 
a range of expression rates that is tight but contains all 
experimental measurements. This leads to the construction of 
models that can reproduce all observed behavior but are not 
overly general, which would make their subsequent analysis 
in Sec. |IV] too conservative. In the following, we denote 
measured expression rates and protein concentrations from 
promoter p and gene g by /3p and Xg, respectively. 

A. Constitutive promoter 

If promoter p is constitutive, expression rate I3p does not 
depend on the concentrations of other species in the system 
(there are no regulators) but varies in range Bp. If a data 
set Dp — {^p(l), . . . ^p(n)} of experimentally measured 
expression rates is available (Fig. |2(a) i, this range is simply 

Bl = [min{Dl),max{D'p)]. (4) 

This captures all experimentally observed rates and extrapo- 
lates under the assumption that any rate between the minimal 
and maximal observed one is also possible for the system. 

In general, expression rates cannot be measured directly 
and must be computed from protein concentration measure- 
ments [24]. If gene g is expressed from constitutive promoter 
p, given a finite trajectory fragment Xg(0), ),..., Xg(n+ 
1) observed in experiments, from Eqn. ^ it follows that the 
expression rate j3p{k) observed at step /c = 0, . . . , n is 

^p{k) ^ Xg{k + 1) - agXg{k). (5) 



The computation outlined above works when measure- 
ments of protein concentrations Xg from individual cells 
are available. In experimental settings, it is often convenient 
to use techniques where protein concentrations for a large 
number of cells are measured simultaneously. In this case, 
individual cells are not identified uniquely and we must 
allow the possibility of a cell making a transition from the 
lowest observed concentration at a step k to the highest one 
observed at step /c + 1 and vice versa. Then, a minimal 
and a maximal possible rate is computed at each step and 
included in the set Dp but the rest of the computation remains 
the same. Single cell experimental techniques lead to the 
identification of tighter expression rate ranges and we only 
consider such measurements through the rest of this paper. 

B. Regulated promoter 

For a regulated promoter p, the rate of expression /3p varies 
in a range Bp{xg'), which is a function of the regulator 
concentration Xg'. Range Bp{xg') is unknown and must be 
computed from the available promoter characterization data 
{l3p,Xg') G Dp (i.e. Dp is a set of expression rates measured 
at different repressor concentrations as in Fig. |2(b)|l. In the 



following, we focus on the construction of the set 

Bp = m,Xg,) I Xg, e \xJ^,X^r\fip e Bp{Xg,)}. (6) 

This allows us to compute Bp(xg') at arbitrary concen- 
trations Xg' as the slice of Bp at Xg' (i.e. Bp{xg') — 
{(3p I {l3p,Xg') G Bp}). By constructing the tightest Bp 
that contains all experimental measurements (i.e. Dp C Bp), 
we guarantee that the model we identify can reproduce all 
observed behavior but our subsequent analysis in Sec. IV is 
not overly conservative. 

It is most straightforward to extend Eqn. Q and compute 
a constant range (Fig. |3(a)| ) as 

Bp = [min{Dp),max{Dp)] x [x^'^, x^"^] where (7) 

Dp = {^p I 3xg' e \x^r,x^r\ C^p^xg') e Dp} (8) 

In this case, the range Bp(xg') is the same for all regulator 
concentrations Xg' and, although it captures all experimental 
data, it includes expression rates from both an activated and 
repressed p which makes the overall method conservative. 
To compute a tighter range, we introduce a set of thresh- 



l, < x^'^^ for alH = 1, 



1. We discuss 



olds O'g, such that x^'"" < 

and < e'gV for all z = 1, _ 

the computation of these thresholds in Sec. III-C and, in 
this subsection, we focus on the expression rates observed 
when regulator concentration falls in the region between two 
thresholds (see Fig. [Sj. For z = 1, . . . , n^' — 1, we define the 
subset 



Dl 



e Dr 



T , < X , 



< oit'}. 



By applying Eqn. (j?]) locally in each region, we can compute 



a piecewise constant range (Fig. 3(b) I as 







Bp — 


[J Bp where 

2=1 


b; = 


[min{Dp) , max 



and Dp is computed for each subset as in Eqn. (jsj). 

A piecewise constant range captures all experimental ob- 
servations while allowing different local ranges for different 
regulator concentration regions. This procedure also leads 



to some computational advantages in Sec. IV but it might 



still be too conservative (i.e. the volume of Bp might be too 
large). We can also compute 



Bi = hull{Di 



(9) 



which is the smallest convex set containing all observed 



therefore. Bp has minimal volume. As it will become clear 
later, for such convex hull range Bp{xg') cannot be com- 
puted easily unless additional thresholds at each vertex are 
introduced, which leads to complications. 

As a compromise, a piecewise linear range as in Fig. 



3(c) has, in general, smaller volume than the piecewise 



constant range from Fig. 3(b) and does not require additional 



thresholds as the convex hull range from Fig. 3(d) Such a 



range can be computed by enumerating all trapezoids that 
have the two thresholds and two of the supporting planes 
from the convex hull range as sides, and selecting the one 
with the smallest volume. Under the additional assumption 
that expression rates are measured only at regulator concen- 



trati ons th at fall on thresholds, the procedures from Figs. |3(c) 
and 



3(d) 



result in the same set Br 



computed using Eqn. ([9]). 



where each Bp can be 



For a piecewise linear range (Fig. |3(c)| ), given regulator 
concentration Xg' such that Xg' — X9g, + (1 
some i — 1, . . . ,ng' — 1 and A G [0, 1], we have 



X)eiV for 



Bp{xg>)^xBp{e\) + {i-x)Bp{eit' 



(10) 



As for constitutive promoters, when expression rates are 
not available directly, they can be computed from protein 
concentration measurements. Given genes g,g' and a pro- 
moter p, such that g is expressed from p and g' regulates 
p, and a trajectory fragment x(0), x{l), . . . , x{n + 1) where 
x{k) — {xg{k),Xg'{k)) is a vector of regulator and protein 
concentrations, we have 



{0p{k),Xgik)) I X{k) = {Xg{k),Xg^{k)), (11) 



/3p{k) = Xg{k + 1) - agXg{k), /c = 1, 



C. Device models 



To summarize the construction of models using the pro- 
cedures we discussed so far, we consider a device consisting 
of a set of genes G and promoters P (see the problem 
formulation in Sec.jll]). For notational simplicity, we assume 
that for z = 1,...,A^, gene gi ^ G h expressed from 
promoter pi G P, which is either constitutive or regulated 
by the protein produced by gene g[ G G. We assume that, 
for each gene g G G, we have at least two thresholds (i.e. 



n„ > 2) where 



= X 



^''^ and Og' = x^'''' 



(i.e. the 



boundaries of the state space X introduced in Sec. [IT] are 
thresholds). Computing the set of thresholds is not the focus 
of this paper but related methods are available [12]. Here, 
we implement a sampling procedure where, out of a number 



expression rates in each region (shown in Fig. 3(d) I and, (flT 



of randomly generated thresholds, we select the subset of a 
given size that minimizes the volume of Bp. 



For a state x ^ Xi where x — {xg^, 



the 

dynamics of each component Xg are given by Eqn. (j3]) where 

Bi 



Pp. e 



,p if p is constitutive or 

Bp{xg'_) if p is regulated 



(12) 



It is important to note that the identified model can 
reproduce all experimental data used for part characteriza- 
tion. Consider a trajectory fragment used in Eqn. ([5]) or 



to respectively characterize a constitutive or regulated 
promoter. We can guarantee that the expression rate from 
the promoter, required to reach the concentration of the 
expressed protein observed at step k -\- 1 starting from the 
concentration observed at step k, is always in the allowed 
range. In Sec. IV we will show that the model structure is 



different for piecewise constant and piecewise linear ranges 
but allows the computation of finite abstractions through 
polyhedral operations, enabling the application of formal 
analysis techniques. 

Remark 1: Note that range Bp(9g,) is not well defined 
and might be different for the r egi ons th at share threshold 
9gf (for example, see Figs. 3(b) and 
rest of this paper, we only consider 
of regions. 



3(c) I and, thorough the 



states from the interior 



IV. FORMAL ANALYSIS 



In Sec. Ill we developed a procedure for the automatic 
construction of device models from part characterization 
data. All experimental measurements were captured in the 
resulting models by allowing expression rates to vary in 
certain ranges. In this section we show that, despite this 
uncertainty, finite quotients of the identified models can 
be constructed using polyhedral operations, which enables 
analysis through methods inspired by model checking. With 
the exception of Prop. [T| the material presented in this section 
is largely a review of our results from [26]. 

The state space X from Eqn. ([T]) is partitioned by the 
thresholds z = 1, . . . , of all genes g ^ G into a number 
of hyper-rectangular regions. We partition X further using all 
linear inequalities tt G 11 (Eqn. ([2])) and ignore the measure- 
zero set consisting of all boundariesj^ This results in a set 
of open polytopes G L such that, for all li,l2 ^ L, 
Xi^ n Xi^ — and \Ji(^lcI{Xi) — X, where d() denotes 
the closure of a set. We denote the set Ui^^Xi as X. Note 
that all states from a given region satisfy the same atomic 
propositions (i.e. for all xi,X2 £ Xi for some / G L and all 
TT G n, xi 1= TT if and only if X2 1= tt). 

We define two states as equivalent if and only if they 
belong to the same region Xi for some I G L. The finite, 
proposition preserving quotient induced by this equivalence 
relation is the transition system T — (Q, ^,Il,\=) where 

• Q — L is the finite set of states, 

^It is unreasonable to assume that equality constraints can be detected 
in practice and, in general, trajectories of the system do not start from or 
disappear in this measure-zero set. 



(a) Constant range 



(b) Piecewise constant range 



(c) Piecewise linear range 



(d) Convex hull range 



Fig. 3. Different methods for fitting experimentally observed expression rates (the same data is used for all procedures). Initial thresholds are shown as 
thick vertical lines. Additional thresholds for|(d)|are shown as thin vertical lines. Resulting regions from each procedures are shown in gray. 



• — >C Q X Q is the transition relation defined as 
{hih) > if and only if there exists a transition from 
a state in region Xi^ to a state in Xi^, 

• n is the set of atomic propositions from Eqn. ([2]), and 

• NC Q X n is the satisfaction relatiorj^ where, given 
I G L and tt G 11, / 1= tt if and only if, for all x ^ Xi, 

X 1= TT. 

From the definition of the transition relation — it follows 
that T simulates the infinite system identified through our 
procedure from Sec. Ill (in other words, T can produce 
any word that the infinite system can produce [20]). This 
allows us to guarantee that if an arbitrary LTL formula (j) 
is satisfied by T at state / G L, then all trajectories of the 
system originating in region Xi satisfy the formula. Note 
that when T does not satisfy (j) from state / we cannot say 
anything about the satisfaction of (j) from region Xi, which 
makes the overall method conservative. 

In [26] we developed an analysis procedure based on the 
construction, model checking and refinement of simulation 
quotients such as T. Our algorithm used model checking 
to partition the set of states L into set L*^ C L from 
which T satisfied an LTL formula (f) and L~^^ C L from 
which T satisfied the negation This allowed us to 
guarantee that all trajectories originating in the satisfying 
region X^ — IJzeL* none of the trajectories origi- 

nating in the violating region X^^ — Uzgl-* '^i satisfied (f). 
Both satisfying and violating trajectories originated in region 
X\{X'^ yj X^'^) and our algorithm implemented an iterative 
refinement procedure to try and separate them, in which case 
X^ and X^^ can be expanded. 

To apply our method from [26] (implemented as the 
software tool FaPAS) we need to be able to construct T, 
which reduces to the computation of its transitions — For 
all / G L, we denote the set of states reachable from Xi in 
one step as Post{Xi). Transitions of T can be computed as 

(/i, /a) if and only if Post{Xi^) n Xi^ ^ 0. (13) 

To show that T can be constructed, we show that 
Post{Xi^) n Xi^ is computable for all li,l2 ^ L. We use 
the notation introduced in Sec. |III] where each promoter, 
gene and regulator is denoted by G P and gi,gi G G, 
i — 1, . . . ,N, respectively. Given a state x G Xi for some 

^We abuse the notation and use symbol 1= to denote the satisfaction of a 
proposition by a state in the original infinite system and its abstraction T. 



/ G L such that x — {x 
dynamics are given by 



'ON , 



the overall system 



x{k-\-l) e Ax{k) -\- B{x{k)), (14) 

where A is the diagonal matrix of degradation rates A — 

ttpiv) and 



diag{ag^ , . 
B{x) 

Bi{Xg') 



Bi{xg' ) X ... X BN{xg' ) where (15) 



B' 



Bp^Xg') 



if Pi is constitutive or 
if Pi is regulated 



If the piecewise constant procedure from Sec. [Ill] is used, 
for all states xi,X2 ^ Xi,l ^ L, we have B{xi) — B{x2) — 
Bi. Therefore, the dynamics from Eqn. ( [T4] ) reduce to 

x{k + 1) G Ax{k) + Bi when x{k) e Xi (16) 

and Post{Xi) = AXi + Bi. 

Proposition 1: For the more general case when the piece- 
wise linear procedure from Sec. |lll] is used, for all / G L, 
Post{Xi) is convex and computable a^ 

Post{Xi) = hull{{Av + B{v) I V e V{Xi)}). 

Proof: See Appendix. ■ 
Following from the results presented so far, regardless 
of which procedure from Sec 
Post{Xi^) nXi^ is convex and computable for all h 



III is used, the intersection 



e L. 

Then, transitions can be computed using Eqn. (13 1 which 
completes the construction of T and, therefore, the satisfying 
and violation regions of the system identified in Sec. |III] 
can be computed. The relative volumes of those regions can 
be used to determine if a device satisfies the specification, 
which provides a solution to Problem [T| The same approach 
can also be used to compare different designs constructed 
from a set of parts based on their satisfaction of a common 
specification. To illustrate such an application, in Sec. [V|we 
use our method to design a synthetic gene network. 

V. DESIGN OF GENE NETWORKS 

To illustrate our method, we apply it to the design of 
a bistable gene network inspired by the "genetic toggle 
switch" [13], which has the topology shown in Fig. |4] We 
start by constructing a library of parts, which includes three 

^As mentioned in Remarklll the set B(v) might be different for different 
regions that share vertex v GV(A';) but from the index / G L it is always 
clear which B{v) is used for the computation. 



TABLE I 



Fig. 4. Toggle switch. Two repressor proteins are expressed from two 
regulated promoters and mutually repress each other. 

genes (denoted by gi, . . .g^) and three promoters (denoted 
by pi, . . .ps). We assume that the proteins produced from 
all three genes are stable and their degradation rates are 
negligible compared to the dilution due to cell growth, which 
leads to a protein half-life of 30 min - an estimate of the 
generation time of bacteria. All promoters are regulated and 
the protein produced by gene gi represses promoter pi. 

To characterize the promoters in the library, we need to 
obtain experimental data of their expression rates at different 
repressor concentrations as described in Sec. |ll] and III 
We follow the strategy from [24] where a characterization 
device similar to the one from Fig. [T] is used. It consists 
of an arbitrary reporter protein that is expressed from the 
regulated promoter to be characterized. The regulator protein 
is initialized at high concentration but is not expressecj^and, 
as a result, repressor concentration decreases over time due 
to degradation. By measuring both repressor and reporter 
concentrations simultaneously we can compute the promoter 
characterization data as in Eqn. (111. We use numerical 
simulation of stochastic differential equations to generate a 
number of trajectories for each characterization device in lieu 
of single cell experimental measurements (several sample tra- 
jectories for all three promoters are shown in the first column 
of Fig.jSj. The rates of expression from each promoter and a 
piecewise constant and piecewise linear ranges are computed 
from this characterization data as described in Sec.lTllland are 
shown in the second and third column of Fig. [5| respectively. 

We consider all "toggle switch" devices with topology as 
in Fig. |4] that can be constructed from the set of available 
parts. For device 1, gene g2 is expressed from promoter pi 
and gene gi is expressed from promoter p2- For device 2, g^ 
is expressed from pi and gi is expressed from p^. For device 
3, g2 is expressed from ps and gs is expressed from p2. 
For each device, we consider specifications (pi — (^Otti and 
02 = {>n7r2 where tti := x„. > 2x„. and 7T2 ■— 2x„. < x 



In other words, specification indicates that eventually 
and for all future times the concentration of protein Xg^ 
is at least twice greater than that of protein Xg^, while 02 
indicates the opposite. We seek a bistable device satisfying 
both specifications from different initial conditions. 

Analysis using the procedure described in Sec. [IV] leads 
to the computation of the relative volumes of the satisfying 
and violating regions for all three devices for each of the 
two specifications. Results for a model constructed using the 



piecewise linear procedure from Sec. Ill are presented in Ta- 
ble [l] (results obtained with the piecewise constant procedure 
are given in parentheses). Analysis indicates that only device 
3 is bistable, which is confirmed through simulations of the 

^experimentally, this is accomplished by expressing the regulator from 
an externally controlled promoter, which is switched off 



spec. 


(/.I = ^Dtti 


4)2 = 0Q7r2 


device 


satisfying 


violating 


satisfying 


violating 


1 


0%(0%) 


46.9%(44.7%) 


35.1%(29.2%) 


0%(0%) 


2 


0%(0%) 


88.8%(85%) 


88.8%(25.2%) 


0%(0%) 


3 


8.4%(7.4%) 


58.4%(47.2%) 


26.7%(20.7%) 


8.4%(7.4%) 



Stochastic differential equation models of all three devices 
(fourth column in Fig. [5]l. 

VI. CONCLUSION 

In this paper, we presented an automated procedure for the 
design of functionally correct synthetic gene networks from 
parts. We formalized high level specifications of required de- 
vice behavior as temporal logic formulas over linear inequali- 
ties in protein concentrations. We developed a procedure for 
the construction of device models from experimental data 
characterizing the different parts the devices were composed 
of. The identified models were related to PWA systems but 
allowed expression rates from promoters to vary in certain 
ranges and could capture all experimental observations. This 
model structure also allowed us to construct finite quotients 
through polyhedral operations. Such quotients could then 
be analyzed using methods inspired by model checking 
to compute a range of initial conditions from which all 
trajectories of the device model were guaranteed to satisfy (or 
violate) the specification. The relative sizes of those regions 
provided information about the correctness of a device design 
with respect to the specification. Our procedure could test 
individual, user-specified device designs or automatically 
search for correct devices by exploring the design space 
of devices constructed from a set of parts. Future research 
directions involve decreasing the conservatism of the method 
by quantifying the "likelihood" of different parameters and 
applying it to real experimental studies. 
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Appendix 

Given a convex region Xi, let x G Xi. Given vertices Vi G 
V{A:i), z = 1, . . . ,M, M = \V{A:i)\ we have x = E^=l >^iVi 
for some such that for all z = 1, .^M, < Ai < 1 and 
Ylfti — ^- From Eqns. ( 10 1 and \l5\ it follows that 

M M 
i=l 1=1 

From Eqn. ( [14] ) we have 

Post{x) e Ax + B{x) = 

M M 

= A^Ai^;^ + E(^Aii;i) = 

M 



Post(x) G hull{{Av + I G V(A'z)}). 

Similarly, let x' G + | v e V{Xi)}). Then, 

for some fii such that for alH = 1, . . . , M, < /x^ < 1 and 

Y^Af 1 



M 



i=l 

x' G Ax + = Postix) where 

M 



